10/30/2018

Goals for this week

  • One factor ANOVA
  • Git and GitHub
  • Means tests in ANOVA
  • Experimental Design
  • Power analyses
  • Multi-factor ANOVA

ANOVA

ANOVA

  • Stands for ANalysis of VAriance
  • Core statistical procedure in biology
  • Developed by R.A. Fisher in the early 20th Century
  • The core idea is to ask how much variation exists within vs. among groups
  • ANOVAs are linear models that have categorical predictor and continuous response variables
  • The categorical predictors are often called factors, and can have two or more levels (important to specify in R)
  • Each factor will have a hypothesis test
  • The levels of each factor may also need to be tested

ANOVA

Let’s start with an example

  • Percent time that male mice experiencing discomfort spent “stretching”.
  • Data are from an experiment in which mice experiencing mild discomfort (result of injection of 0.9% acetic acid into the abdomen) were kept in:
    • isolation
    • with a companion mouse not injected or
    • with a companion mouse also injected and exhibiting “stretching” behaviors associated with discomfort
  • The results suggest that mice stretch the most when a companion mouse is also experiencing mild discomfort. Mice experiencing pain appear to “empathize” with co-housed mice also in pain.

From Langford, D. J.,et al. 2006. Science 312: 1967-1970

ANOVA

Let’s start with an example

In words:

stretching = intercept + treatment






- The model statement includes a response variable, a constant, and an explanatory variable.
- The only difference with regression is that here the explanatory variable is categorical.

ANOVA

Let’s start with an example

ANOVA

ANOVA

Conceptually similar to regression

ANOVA

Statistical results table

ANOVA

F-ratio calculation

ANOVA

F-ratio calculation

R INTERLUDE

One way ANOVA

  • Again, use the RNAseq_lip.tsv data again.
  • Let’s test for an effect of Population on Gene01 expression levels
  • First, let’s look at how the data are distributed
RNAseq_Data <- read.table('RNAseq_lip.tsv', header=T, sep='\t')
g1 <- RNAseq_Data$Gene01
Pop <- RNAseq_Data$Population
boxplot(g1~Pop, col=c("blue","green"))

Or, to plot all points:

stripchart(g1~Pop, vertical=T, pch=19, col=c("blue","green"), 
           at=c(1.25,1.75), method="jitter", jitter=0.05)
Pop_Anova <- aov(g1 ~ Pop)
summary(Pop_Anova)

R INTERLUDE

One way ANOVA

ANOVA

One or more predictor variables

  • One-way ANOVAs just have a single factor
  • Multi-factor ANOVAs
    • Factorial - two or more factors and their interactions
    • Nested - the levels of one factor are contained within another level
    • The models can be quite complex
  • ANOVAs use an F-statistic to test factors in a model
    • Ratio of two variances (numerator and denominator)
    • The numerator and denominator d.f. need to be included (e.g. \(F_{1, 34} = 29.43\))
  • Determining the appropriate test ratios for complex ANOVAs takes some work

ANOVA

Assumptions

  • Normally distributed groups
    • robust to non-normality if equal variances and sample sizes
  • Equal variances across groups
    • okay if largest-to-smallest variance ratio < 3:1
    • problematic if there is a mean-variance relationship among groups
  • Observations in a group are independent
    • randomly selected
    • don’t confound group with another factor

Different ways to include factors in models

ANOVA

Fixed effects of factors

  • Groups are predetermined, of direct interest, repeatable.
  • For example:
    • medical treatments in a clinical trial
    • predetermined doses of a toxin
    • age groups in a population
    • habitat, season, etc.
  • Any conclusions reached in the study about differences among groups can be applied only to the groups included in the study.
  • The results cannot be generalized to other treatments, habitats, etc. not included in the study.

ANOVA

Random effects of factors

  • Measurements that come in groups. A group can be:
    • a family made up of siblings
    • a subject measured repeatedly
    • a transect of quadrats in a sampling survey
    • a block of an experiment done at a given time
  • Groups are assumed to be randomly sampled from a population of groups.
  • Therefore, conclusions reached about groups can be generalized to the population of groups.
  • With random effects, the variance among groups is the main quantity of interest, not the specific group attributes.

ANOVA

Random effects of factors

  • Below are cases where you are likely to treat factors as random effects
  • Whenever your sampling design is nested
    • quadrats within transects
    • transects within woodlots
    • woodlots within districts
  • Whenever you divide up plots and apply separate treatments to subplots
  • Whenever your replicates are grouped spatially or temporally
    • in blocks
    • in batches
  • Whenever you take measurements on related individuals
  • Whenever you measure subjects or other sampling units repeatedly

ANOVA

Random effects of factors

ANOVA

Random effects - test your understanding

  • Factor is sex (Male vs. Female)
  • Factor is fish tank (10 tanks in an experiment)
  • Factor is family (measure multiple sibs per family)
  • Factor is temperature (10 arbitrary temps over natural range)

ANOVA

Caution about fixed vs. random effects

  • Using fixed vs. random effects changes the way that statistical tests are performed in ANOVA
  • Most statistical packages assume that all factors are fixed unless you instruct it otherwise
  • Designating factors as random takes extra work and probably a read of the manual
  • In R, lm assumes that all effects are fixed
  • For random effects, use lme instead (part of the nlme package)

Git and GitHub

Git and GitHub

Means test to compare levels of a factor

Means for greater than two factor levels?

  • The F-ratio test for a single-factor ANOVA tests for any difference among groups.
  • If we want to understand specific differences, we need further “contrasts”.
  • Unplanned comparisons (post hoc):
    • Multiple comparisons carried out after the results are obtained.
    • Used to find where the differences lie (which means differ from which other means)
    • Comparisons require protection for inflated Type 1 error rates:
      • Tukey tests: compare all pairs of means and control for multiple comparisons
      • ScheffĂ© contrasts: compare all combinations of means
  • Planned comparisons (a priori):
    • Comparisons between group means that were decided when the experiment was designed (not after the data were in)
    • Must be few in number to avoid inflating Type 1 error rates

Planned (a priori) contrasts

  • A well planned experiment often dictates which comparison of means are of most interest, whereas other comparisons are of no interest.
  • By restricting the comparisons to just the ones of interest, researchers can mitigate the multiple testing problem associated with post-hoc tests.
  • Some statisticians argue that, in fact, planned comparisons allow researchers to avoid adjusting p-values all together because each test is therefore unique.
  • Contrasts can also allow more complicated tests of the relationships among means.
  • Coding a priori contrasts in R is quite easy and just depends upon writing the right series of coefficient contrasts.

Planned (a priori) contrasts

Understand the coefficients table

R INTERLUDE

Planned contrasts

  • Take the RNAseq data you’ve examined before and create a new four level genotype by combining genotype and microbiota treatment into a single variable
  • Think about how to do this using dplyr functions.
RNAseq_Data <- read.table("RNAseq.tsv", header=T, sep='')

x <- RNAseq_Data$categorical_var
y <- RNAseq_Data$continuous_var1
z <- RNAseq_Data$continuous_var2
  • Set up the a priori contrasts specifically testing one group mean against another
  • These are just examples - you should figure out the logic of the contrasts
  • Confirm that the contrasts are orthogonal
contrasts(x) <- cbind(c(0, 1, 0, -1), c(2, -1, 0, -1), c(-1, -1, 3, -1))
round(crossprod(contrasts(x)), 2)

R INTERLUDE

Planned contrasts

  • Define the contrast labels
  • Then Fit the Fixed Effect model
RNAseq_aov_fixed <- aov(y ~ x)
plot(RNAseq_aov_fixed)
boxplot(y ~ x)
summary(RNAseq_aov_fixed, split = rnaseq_data_list)

R INTERLUDE

Unplanned contrasts

  • Read in the perchlorate data from Week 3
  • Let’s assess the effects of the 4 perchlorate levels on T4
  • Which perchlorate levels differ in their effect on T4?
perc <- read.table('perchlorate_data.tsv', header=T, sep='\t')

x <- perc$Perchlorate_Level
y <- log10(perc$T4_Hormone_Level)

MyANOVA <- aov(y ~ x)
summary (MyANOVA)
boxplot(y ~ x)

install.packages("multcomp")
library(multcomp)

summary(glht(MyANOVA, linfct = mcp(x = "Tukey")))

Design principles for planning a good experiment

What is an experimental study?

  • In an experimental study the researcher assigns treatments to units
  • In an observational study nature does the assigning of treatments to units
  • The crucial advantage of experiments derives from the random assignment of treatments to units
  • Random assignment, or randomization, minimizes the influence of confounding variables

Mount Everest example

Survival of climbers of Mount Everest is higher for individuals taking supplemental oxygen than those who don’t. Why?

Mount Everest example

  • One possibility is that supplemental oxygen (explanatory variable) really does cause higher survival (response variable).
  • The other possibility is that the two variables are associated because other variables affect both supplemental oxygen and survival.
  • Use of supplemental oxygen might be a benign indicator of a greater overall preparedness of the climbers that use it.
  • Variables (like preparedness) that distort the causal relationship between the measured variables of interest (oxygen use and survival) are called confounding variables
  • They are correlated with the variable of interest, and therefore preventing a decision about cause and effect.
  • With random assignment, no confounding variables will be associated with treatment except by chance.

Clinical Trials

  • The gold standard of experimental designs is the clinical trial.
  • Experimental design in all areas of biology have been informed by procedures used in clinical trials.
  • A clinical trial is an experimental study in which two or more treatments are assigned to human subjects.
  • The design of clinical trials has been refined because the cost of making a mistake with human subjects is so high.
  • Experiments on nonhuman subjects are simply called “laboratory experiments”or “field experiments”, depending on where they take place.

Example of a clinical trial

  • Transmission of the HIV-1 virus via sex workers contributes to the rapid spread of AIDS in Africa
  • The spermicidenonoxynol-9 had shown in vitro activity against HIV-1, which motivated a clinical trial by van Damme et al. (2002).
  • They tested whether a vaginal gel containing the chemical would reduce the risk of acquiring the disease by female sex workers.
  • Data were gathered on a volunteer sample of 765 HIV-free sex-workers in six clinics in Asia and Africa.
  • Two gel treatments were assigned randomly to women at each clinic.
  • One gel contained nonoxynol-9 and the other contained a placebo.
  • Neither the subjects nor the researchers making observations at the clinics knew who received the treatment and who got the placebo.

Example of a clinical trial

Design components of a clinical trial

The goal of experimental design is to eliminate bias and to reduce sampling error when estimating and testing effects of one variable on another.

  • To reduce bias, the experiment included:
    • Simultaneous control group: study included both the treatment of interest and a control group (the women receiving the placebo).
    • Randomization: treatments were randomly assigned to women at each clinic.
    • Blinding: neither the subjects nor the clinicians knew which women were assigned which treatment.
  • To reduce the effects of sampling error, the experiment included:
    • Replication: study was carried out on multiple independent subjects.
    • Balance: number of women was nearly equal in the two groups at every clinic.
    • Blocking: subjects were grouped according to the clinic they attended, yielding multiple repetitions of the same experiment in different settings (“blocks”).

Simultaneous control group

  • In clinical trials either a placebo or the currently accepted treatment should be provided.
  • In experiments requiring intrusive methods to administer treatment, such as injections, surgery, restraint, or confinement, the control subjects should be perturbed in the same way as the other subjects, except for the treatment itself, as far as ethical considerations permit.
  • The “sham operation”, in which surgery is carried out without the experimental treatment itself, is an example.
  • In field experiments, applying a treatment of interest may physically disturb the plots receiving it and the surrounding areas, perhaps by trampling the ground by the researchers.
  • Ideally, the same disturbance should be applied to the control plots.

Randomization

  • Once treatments are chosen, the researcher should randomize assignment to units or subjects.
  • Randomization means that treatments are assigned to units at random, such as by flipping a coin.
  • Chance rather than conscious or unconscious decision determines which units end up receiving the treatment of interest and which receive the control.
  • A completely randomized design is an experimental design in which treatments are assigned to all units by randomization.
  • Randomization breaks the association between possible confounding variables and the explanatory variable, allowing the causal relationship to be assessed.
  • Randomization doesn’t eliminate the variation contributed by confounding variables, only their correlation with treatment.
  • It ensures that variation from confounding variables is similar between the different treatment groups.

Randomization

  • Randomization should be carried out using a random process:
    • List all n subjects, one per row, in a computer spreadsheet.
    • Use the computer to give each individual a random number.
    • Assign treatment A to those subjects receiving the lowest numbers and treatment B to those with the highest numbers.
  • Other ways of assigning treatments to subjects are almost always inferior because they do not eliminate the effects of confounding variables.
  • “Haphazard” assignment, in which the researcher chooses a treatment while trying to make it random, has repeatedly been shown to be non-random and prone to bias.

Blinding

  • Blinding is the process of concealing information from participants (sometimes including researchers) about which subjects receive which treatment.
  • Blinding prevents subjects and researchers from changing their behavior, consciously or unconsciously, as a result of knowing which treatment they were receiving or administering.
  • For example, studies showing that acupuncture has a significant effect on back pain are limited to those without blinding (Ernst and White 1998).

Blinding

  • In a single-blind experiment, the subjects are unaware of the treatment that they have been assigned.
  • Treatments must be indistinguishable to subjects, which prevents them from responding differently according to knowledge of treatment.
  • Can also be a concern in non-human studies.
  • In a double-blind experiment the researchers administering the treatments and measuring the response are also unaware of which subjects are receiving which treatments.
    • Researchers sometimes have pet hypotheses, and they might treat experimental subjects in different ways depending on their hopes for the outcome.
    • Many response variables are difficult to measure and require some subjective interpretation, which makes the results prone to a bias.
    • Researchers are naturally more interested in the treated subjects than the control subjects, and this increased attention can itself result in improved response.

Blinding

  • Reviews of medical studies have revealed that studies carried out without double- blinding exaggerated treatment effects by 16% on average compared with studies carried out with double-blinding (JĂĽni et al. 2001).
  • Experiments on non–human subjects are also prone to bias from lack of blinding.
  • Bebarta et al.(2003) reviewed 290 two-treatment experiments carried out on animals or on cell lines. The odds of detecting a positive effect of treatment were more than threefold higher in studies without blinding than in studies with blinding.
  • Blinding can be incorporated into experiments on nonhuman subjects using coded tags that identify the subject to a “blind” observer without revealing the treatment (and who measures units from different treatments in random order).

Replication

  • The goal of experiments is to estimate and test treatment effects against the background of variation between individuals (“noise”) caused by other variables.
  • One way to reduce noise is to make the experimental conditions constant. Fix the temperature, humidity, and other environmental conditions, for example, and use only subjects that are the same age, sex, genotype, and so on.
  • In field experiments, however, highly constant experimental conditions might not be feasible.
  • Constant conditions might not be desirable, either.
  • By limiting the conditions of an experiment, we also limit the generality of the results—that is, the conclusions might apply only under the conditions tested and not more broadly.
  • Another way to make treatment effects stand out is to include extreme treatments and to replicate the data.

Replication

  • Replication is the assignment of each treatment to multiple, independent experimental units.
  • Without replication, we would not know whether response differences were due to the treatments or just chance differences between the treatments caused by other factors.
  • Studies that use more units (i.e. that have larger sample sizes) will have smaller standard errors and a higher probability of getting the correct answer from a hypothesis test.
  • Larger samples mean more information, and more information means better estimates and more powerful tests.
  • Replication is not about the number of plants or animals used, but the number of independent units in the experiment. An “experimental unit” is the independent unit to which treatments are assigned.
  • The figure shows three experimental designs used to compare plant growth under two temperature treatments (indicated by the shading of the pots). The first two designs are un-replicated.

Pseudoreplication

  • Replicates that are used for inferences at the incorrect scale.
  • Note - pseudoreplication is therefore context dependent
  • Easy to see in more simple designs, difficult in complex designs such nested ANOVA.
  • Nested ANOVA data are often mis-analyzed as pseudoreplicates
  • Example - experimental fish in two different tanks, each tank receiving a different treatment. (observations not independent with respect to treatment)

Pseudoreplication

Balance

  • A study design is balanced if all treatments have the same sample size. Conversely, a design is unbalanced if there are unequal sample sizes between treatments.
  • Balance is a second way to reduce the influence of sampling error on estimation and hypothesis testing. To appreciate this, look again at the equation for the standard error of the difference between two treatment means.
  • Balance has other benefits. For example, ANOVA is more robust to departures from the assumption of equal variances when designs are balanced or nearly so.
  • For a fixed total number of experimental units, n1 + n2, the standard error is smallest when n1 and n2 are equal.

Blocking

  • Blocking is the grouping of experimental units that have similar properties. Within each block, treatments are randomly assigned to experimental units.
  • Blocking essentially repeats the same, completely randomized experiment multiple times, once for each block.
  • Differences between treatments are only evaluated within blocks, and in this way the component of variation arising from differences between blocks is discarded.

Blocking

Paired designs

  • For example, consider the design choices for a two-treatment experiment to investigate the effect of clear cutting on salamander density.
  • In the completely randomized (“two-sample”) design we take a random sample of forest plots from the population and then randomly assign each plot to either the clear-cut treatment or the no clear-cut treatment.
  • In the paired design we take a random sample of forest plots and clear-cut a randomly chosen half of each plot, leaving the other half untouched.

Blocking

Paired designs

  • In the paired design, measurements on adjacent plot-halves are not independent. This is because they are likely to be similar in soil, water, sunlight, and other conditions that affect the number of salamanders.
  • As a result, we must analyze paired data differently than when every plot is independent of all the others, as in the case of the two-sample design.
  • Paired design is usually more powerful than completely randomized design because it controls for a lot of the extraneous variation between plots or sampling units that sometimes obscures the effects we are looking for.

Blocking

Paired designs

Blocking

Randomized complete block design

  • RCB design is analogous to the paired design, but may have more than two treatments. Each treatment is applied once to every block.
  • As in the paired design, treatment effects in a randomized block design are measured by differences between treatments exclusively within blocks.
  • By accounting for some sources of sampling variation blocking can make differences between treatments stand out.
  • Blocking is worthwhile if units within blocks are relatively homogeneous, apart from treatment effects, and units belonging to different blocks vary because of environmental or other differences.

Blocking

Randomized complete block design

  • For example, Srivastava and Lawton (1998) made artificial treeholes from plastic that mimicked the buttress tree holes of European beech trees to examine how the amount of decaying leaf litter affected the number of insect eggs deposited (mainly by mosquitoes and hover flies) and the survival of the larvae.
  • In one treatment (LL), a low amount of leaf litter was provided. In a second treatment (HH), a high level of debris was provided. In the third treatment (LH), leaf litter amounts were initially low but were then made high after eggs had been deposited.
  • A randomized block design was used in which artificial tree holes were laid out in triplets (blocks). Each block consisted of one LL tree hole, one HH tree hole, and one LH tree hole.
  • The location of each treatment within a block was randomized.

Blocking

Randomized complete block design

What if you can’t do experiments?

  • Experimental studies are not always feasible, in which case we must fall back upon observational studies.
  • The best observational studies incorporate as many of the features of good experimental design as possible to minimize bias (e.g., blinding) and the impact of sampling error (e.g., replication, balance, blocking, and even extreme treatments) except for one: randomization.
  • Randomization is out of the question, because in an observational study the researcher does not assign treatments to subjects. Instead, the subjects come as they are.
  • Two strategies are used to limit the effects of confounding variables on a difference between treatments in a controlled observational study: matching; and adjusting for known confounding variables (covariates).